253 research outputs found

    Pattern-based phylogenetic distance estimation and tree reconstruction

    Get PDF
    We have developed an alignment-free method that calculates phylogenetic distances using a maximum likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au), we have created a dataset of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods on this dataset. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.Comment: 21 pages, 3 figures, 2 table

    Trees and networks before and after Darwin

    Get PDF
    It is well-known that Charles Darwin sketched abstract trees of relationship in his 1837 notebook, and depicted a tree in the Origin of Species (1859). Here I attempt to place Darwin's trees in historical context. By the mid-Eighteenth century the Great Chain of Being was increasingly seen to be an inadequate description of order in nature, and by about 1780 it had been largely abandoned without a satisfactory alternative having been agreed upon. In 1750 Donati described aquatic and terrestrial organisms as forming a network, and a few years later Buffon depicted a network of genealogical relationships among breeds of dogs. In 1764 Bonnet asked whether the Chain might actually branch at certain points, and in 1766 Pallas proposed that the gradations among organisms resemble a tree with a compound trunk, perhaps not unlike the tree of animal life later depicted by Eichwald. Other trees were presented by Augier in 1801 and by Lamarck in 1809 and 1815, the latter two assuming a transmutation of species over time. Elaborate networks of affinities among plants and among animals were depicted in the late Eighteenth and very early Nineteenth centuries. In the two decades immediately prior to 1837, so-called affinities and/or analogies among organisms were represented by diverse geometric figures. Series of plant and animal fossils in successive geological strata were represented as trees in a popular textbook from 1840, while in 1858 Bronn presented a system of animals, as evidenced by the fossil record, in a form of a tree. Darwin's 1859 tree and its subsequent elaborations by Haeckel came to be accepted in many but not all areas of biological sciences, while network diagrams were used in others. Beginning in the early 1960s trees were inferred from protein and nucleic acid sequences, but networks were re-introduced in the mid-1990s to represent lateral genetic transfer, increasingly regarded as a fundamental mode of evolution at least for bacteria and archaea. In historical context, then, the Network of Life preceded the Tree of Life and might again supersede it

    An evaluation of DNA-damage response and cell-cycle pathways for breast cancer classification

    Get PDF
    Accurate subtyping or classification of breast cancer is important for ensuring proper treatment of patients and also for understanding the molecular mechanisms driving this disease. While there have been several gene signatures proposed in the literature to classify breast tumours, these signatures show very low overlaps, different classification performance, and not much relevance to the underlying biology of these tumours. Here we evaluate DNA-damage response (DDR) and cell cycle pathways, which are critical pathways implicated in a considerable proportion of breast tumours, for their usefulness and ability in breast tumour subtyping. We think that subtyping breast tumours based on these two pathways could lead to vital insights into molecular mechanisms driving these tumours. Here, we performed a systematic evaluation of DDR and cell-cycle pathways for subtyping of breast tumours into the five known intrinsic subtypes. Homologous Recombination (HR) pathway showed the best performance in subtyping breast tumours, indicating that HR genes are strongly involved in all breast tumours. Comparisons of pathway based signatures and two standard gene signatures supported the use of known pathways for breast tumour subtyping. Further, the evaluation of these standard gene signatures showed that breast tumour subtyping, prognosis and survival estimation are all closely related. Finally, we constructed an all-inclusive super-signature by combining (union of) all genes and performing a stringent feature selection, and found it to be reasonably accurate and robust in classification as well as prognostic value. Adopting DDR and cell cycle pathways for breast tumour subtyping achieved robust and accurate breast tumour subtyping, and constructing a super-signature which contains feature selected mix of genes from these molecular pathways as well as clinical aspects is valuable in clinical practice.Comment: 28 pages, 7 figures, 6 table

    A two-phase approach for detecting recombination in nucleotide sequences

    Full text link
    Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombinationdetecting approaches can vary, depending on evolutionary events that take place after recombination. We recently evaluated the effects of postrecombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach in delineating breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.Comment: 5 pages, 3 figures. Chan CX, Beiko RG and Ragan MA (2007). A two-phase approach for detecting recombination in nucleotide sequences. In Hazelhurst S and Ramsay M (Eds) Proceedings of the First Southern African Bioinformatics Workshop, 28-30 January, Johannesburg, 9-1

    Quantitative Prediction of miRNA-mRNA Interaction Based on Equilibrium Concentrations

    Get PDF
    MicroRNAs (miRNAs) suppress gene expression by forming a duplex with a target messenger RNA (mRNA), blocking translation or initiating cleavage. Computational approaches have proven valuable for predicting which mRNAs can be targeted by a given miRNA, but currently available prediction methods do not address the extent of duplex formation under physiological conditions. Some miRNAs can at low concentrations bind to target mRNAs, whereas others are unlikely to bind within a physiologically relevant concentration range. Here we present a novel approach in which we find potential target sites on mRNA that minimize the calculated free energy of duplex formation, compute the free energy change involved in unfolding these sites, and use these energies to estimate the extent of duplex formation at specified initial concentrations of both species. We compare our predictions to experimentally confirmed miRNA-mRNA interactions (and non-interactions) in Drosophila melanogaster and in human. Although our method does not predict whether the targeted mRNA is degraded and/or its translation to protein inhibited, our quantitative estimates generally track experimentally supported results, indicating that this approach can be used to predict whether an interaction occurs at specified concentrations. Our approach offers a more-quantitative understanding of post-translational regulation in different cell types, tissues, and developmental condition

    Within-species lateral genetic transfer and the evolution of transcriptional regulation in Escherichia coli and Shigella

    Get PDF
    Background: Changes in transcriptional regulation underlie many of the phenotypic differences observed within and between species of bacteria. Lateral genetic transfer (LGT) can significantly impact the transcription factor (TF) genes which drive these transcriptional changes. Although much emphasis has been placed on LGT of intact genes, the units of transfer and recombination do not necessarily correspond to regions delineated by exact gene boundaries. Here we apply phylogenetic and network-based methods to investigate the relationship between units of lateral transfer and recombination within the Escherichia coli - Shigella clade and the topological properties of genes in the E. coli transcriptional regulatory network (TRN)

    Understanding cellular function and disease with comparative pathway analysis

    Get PDF
    Pathway analysis is important in interpreting the functional implications of high-throughput experimental results, but robust comparison across platforms and species is problematic. A new approach, Pathprinting, provides a cross-platform, cross-species comparative analysis of pathway expression signatures. This method calculates pathway-level statistics from gene expression across nearly 180,000 microarrays in the Gene Expression Omnibus. Pathprinting can accurately retrieve phenotypically similar samples and identify sets of human and mouse genes that are prognostic in cancer
    corecore